Convex Optimization: Optimality Conditions for Equality Constraints

Imagine a physical system, like a hanging chain, seeking its lowest energy state. If the endpoints are fixed, the system cannot move freely. Optimality is reached when the internal forces (the gradient of potential energy) are perfectly balanced by the tension forces exerted by the constraints. In convex optimization, this balance is captured by the KKT conditions for equality constraints.

The Geometry of Feasibility

For a convex problem with equality constraints $Ax=b$, a vector $v \in \mathbf{R}^n$ is a feasible direction if $Av = 0$. This means that moving in direction $v$ maintains the constraint: $A(x+v) = b$ if $Ax=b$.

For $x^*$ to be optimal, the directional derivative $\nabla f(x^*)^T v$ must be zero for all feasible directions $v$ in the nullspace $\mathcal{N}(A)$. This implies that the gradient $\nabla f(x^*)$ must be orthogonal to $\mathcal{N}(A)$, leading us to the existence of Lagrange multipliers.

The Optimality Condition

A point $x^*$ is optimal if and only if there exists a vector $\nu^* \in \mathbb{R}^p$ such that:

$\nabla f(x^*) + A^T \nu^* = 0$

This forms a system of linear equations that characterize the equilibrium between the objective's descent and the constraint manifold.

The Projected Gradient

The Euclidean projection of the negative gradient $-\nabla f(x)$ onto the nullspace $\mathcal{N}(A)$ is given by:

$\Delta x_{\text{pg}} = \text{argmin}_{Au=0} \| -\nabla f(x) - u \|_2$

This vector represents the "best" feasible descent direction. Crucially, $x$ is optimal if and only if $\Delta x_{\text{pg}} = 0$.

The KKT System and Matrix Properties

We can solve for the Newton step and dual variables simultaneously using the block system:

$$\begin{bmatrix} I & A^T \\ A & 0 \end{bmatrix} \begin{bmatrix} v \\ w \end{bmatrix} = \begin{bmatrix} -\nabla f(x) \\ 0 \end{bmatrix}$$

Note on Matrix Spectral Properties: The KKT matrix is symmetric but indefinite. If the matrix is nonsingular, it has exactly $n$ positive and $p$ negative eigenvalues. This implies that the optimal point $(x^*, \nu^*)$ is a saddle point of the Lagrangian $L(x, \nu) = f(x) + \nu^T(Ax-b)$, rather than a simple local minimum in the combined primal-dual space.

🎯 Core Principle

Optimality in equality-constrained problems requires the gradient to be orthogonal to the constraint's nullspace. This allows us to interpret the Newton step as the solution to a linearized approximation of the KKT conditions.

QUESTION 1

What defines a 'feasible direction' $v$ at a point $x$ for the constraint $Ax = b$?

$Av = b$

$Av = 0$

$\|v\|_2 = 1$

$A^T v = 0$

QUESTION 2

What is the physical interpretation of the condition $\nabla f(x^*) + A^T \nu^* = 0$?

The gradient is zero in all directions.

The objective function has reached a global minimum without constraints.

The internal 'force' (gradient) is balanced by the 'tension' from constraints.

The Lagrange multipliers must all be positive.

QUESTION 3

If the KKT matrix is nonsingular, what is its eigenvalue signature?

All $n+p$ eigenvalues are positive.

$n$ positive and $p$ negative eigenvalues.

$p$ positive and $n$ negative eigenvalues.

All eigenvalues are real and non-zero, but the signs depend on $f(x)$.

QUESTION 4

The projected gradient $\Delta x_{\text{pg}}$ is zero if and only if:

The gradient $\nabla f(x)$ is zero.

The point $x$ is in the nullspace of $A$.

The point $x$ satisfies the first-order optimality conditions.

The matrix $A$ is square and invertible.

QUESTION 5

In the system $\begin{bmatrix} I & A^T \\ A & 0 \end{bmatrix} \begin{bmatrix} v \\ w \end{bmatrix} = \begin{bmatrix} -\nabla f(x) \\ 0 \end{bmatrix}$, what does $v$ represent?

The optimal solution $x^*$.

The Lagrange multiplier vector.

The projected gradient $\Delta x_{\text{pg}}$.

The primal residual $Ax - b$.